CUDA 编程指南：迈向吞吐量优先的计算

计算领域经历了一场根本性转变，从 延迟优化 CPU 设计转向 吞吐量优先 GPU 架构。虽然 CPU 像一辆高速快递摩托车（单个包裹处理速度快），但 GPU 则像一艘巨型货轮：每个物品移动较慢，却能一次运输 5 万个集装箱。

CPU 的设计旨在通过复杂的分支预测技术，最小化单个指令序列的「完成时间」。相反， 图形处理器（GPU） 则被设计为通过并行执行数千个线程来最大化「每秒工作量」，以牺牲单线程速度换取巨大的总体吞吐能力。

在相似的价格和功耗范围内，GPU 提供的指令吞吐量和内存带宽远高于 CPU。GPU 专为高度并行计算而设计，将更多晶体管用于 数据处理单元（算术逻辑单元），而 CPU 则将更多晶体管用于数据缓存和流程控制。

统一计算设备架构（CUDA） 由英伟达于 2006 年推出。它是一种并行计算平台和编程模型，能够通过独立于图形 API 的方式充分利用 GPU 的强大性能，实现性能的显著提升。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which component consumes the majority of silicon real estate in a traditional CPU?

Arithmetic Logic Units (ALUs)

Control logic and Data Caching

Floating Point Units

Memory Controllers

QUESTION 2

What was the original purpose of the GPU before CUDA?

General purpose scientific computing

Operating system kernel management

Fixed-function hardware for 3D rendering

High-frequency trading

QUESTION 3

In the cargo ship analogy, what represents the 'Throughput'?

The speed at which the ship moves across the ocean.

The total volume of containers delivered at once.

The size of the ship's engine.

The fuel efficiency per container.

QUESTION 4

What is the primary trade-off made by GPUs to achieve high aggregate throughput?

Higher power consumption per unit.

Lower single-thread performance.

Reduced memory bandwidth.

Simplified mathematical precision.

QUESTION 5

Which NVIDIA software component is required to run CUDA applications?

DirectX 12

NVIDIA Driver and CUDA Toolkit

OpenGL Wrapper

Windows GDI+